You are here: Industry Insights > > JTB | CNN Realizes The Visualization Of Protein-peptide Binding Features To Predict Their Binding Si

Get alerted on the PharmaSources E-Newsletter and Pharma Sources Insight E-Compilation!

Note: You can unsubscribe from the alerts at any time.

Advertising

New Products

Vancomycin hydrochloride

2-(Naphthalen-2-yl)acetonitrile

3-Ethylhexahydro-1-methyl-2H-azepin-2-one

Ginseng Extract (Ginsenosides 1%-30% HPLC)

Ginseng Extract (Ginsenosides 1%-45%HPLC)

Ginseng Extract (Ginsenosides 1%-80% UV)

JTB | CNN Realizes The Visualization Of Protein-peptide Binding Features To Predict Their Binding Si

October 29, 2020

Tag:

Favorites

Today, we introduce WafaaWardah et al. of the University of the South Pacific, who published an article in the Journal of Economics, "Predicting Protein-Peptide Conjugates with Convolutional Neural Networks".Prediction of protein-peptide binding sites plays an important role in disease prevention and drug development.However, the existing prediction methods did not show good results in actual prediction, especially the sensitivity did not even reach 50%.This paper presents a method for predicting protein-peptide binding sites using CNN framework to process "visualized" protein feature data.The authors innovatively introduced the "sliding window method" to transform the initial protein feature data into "visualized" matrix information, and then input the CNN framework for training.Finally, the prediction results are output through fully connected network, and the Bayesian optimization method is embedded in the CNN framework to deal with the hyper-parameters, so that the model achieves excellent results on the test set.

《经济学杂志》An article published in the Journal of Economics, "Prediction of Protein-Peptide Conjugates Using Convolutional Neural Networks"发表的文章“用卷积神经网络预测蛋白质-肽结合物”

I. Research background

Studying the interaction between protein and peptide is of great significance in the field of bioinformatics, and the interaction between protein molecule and peptide can be analyzed by studying its complex structure.But we know that such complex structures account for only a small fraction, and it is not only costly but also ineffective to analyze their interactions by doing biological experiments.Therefore, predicting the binding region of proteins and peptides by computer will bring great help to experimental research.

Existing methods for predicting protein-peptide binding sites perform well on experimental data sets, but perform poorly on the accuracy and sensitivity of binding residues in actual prediction.To address this problem, the authors used "visualization" within the CNN framework.

2. Models and methods

Fig. 1 Framework of protein-polypeptide binding site prediction model based on CNN

Fig. 1 Framework of protein-peptide binding site prediction model based on CNN

2.1 Feature selection and preprocessing.

In terms of feature selection, the authors used several groups of features with good discrimination in predicting protein-peptide binding sites, such as hemispherical structure (HSE), secondary structure (SS), auxiliary surface area (ASA), PSSM, etc.Then, the authors extended these sets of features into a set of numerical matrices ([1, 38]), representing all the above features with 38 values.

2.2 "Visualization" feature transformation.

In order to input the features of proteins into the CNN framework, it is necessary to transform their features into "visual" matrix information.The authors used the "sliding window" method to represent each residue in the protein chain as a sequence containing three neighbors on the left and three neighbors on the right with a fixed size window (in this paper, the size is 7, the window is similar to the matrix of [1, 7]), which is equivalent to representing one residue in the middle with a characteristic matrix of seven residues (each residue is represented by a matrix of [7, 38]).

Sliding window method

Figure 2 Sliding window method

2.3 Model training.

The CNN framework generally includes two convolution layers, one convergence layer and one full connection layer.In the first layer of convolution, 256 [3, 3] convolution checks are used to convolute the "visualized" feature matrices ([7, 38]), resulting in 256 [5, 36] convolution feature matrices, which are then passed to the activation function CorrectedLineArunit (ReLu).There are 256 convolution cores in the second convolution layer, so 256 convolution feature matrices [4,35] are obtained after convolution, and the ReLU activation function is reused.The window size of [2,2] is adopted in the aggregation layer, and 256 aggregation feature matrices of [2,17] are obtained after aggregation.Finally, in order to input a fully connected network, 256 matrices of [2,17] are expanded into vectors of [1,8704], then input into a fully connected neural network, and finally a prediction vector of [1,2] is obtained to indicate whether the residues introduced into the CNN framework are binding residues.In addition, the authors use Bayesian optimization to optimize the hyper-parameters in the CNN framework, including the number of convolution cores in the two convolution layers and the learning rate used by the optimizer when updating the network weights.

"Visualization" feature data training process III. Experimental results

Figure 3 Training process of "Visualization" feature data

3. Experimental results

3.1 Comparison of predicted results with actual results.

The method of predicting actual protein chains by processing the "visual" feature matrix with the deep CNN framework proposed by the authors is almost the same as that obtained by the experimental method, which shows that the method has better prediction effect on binding residues in practical use.In Figure 4 (a), the upper chain shows the binding residues in the actual protein chain, and the lower chain shows the predicted binding residues in the deep CNN framework; (b) and (c) are computer-generated protein binding site maps, (b) are actual binding site maps, and (c) are predicted binding site maps.Thus, this method can accurately cover the binding residue region.

Predicted and actual results

Figure 4 Predicted and actual results

3.2 Compared with existing methods.

Table 1 compares the author's method with several top-ranked methods.From the AUC index, the author's method (visual) and SPRINT-Str method are much better than other methods. The AUC of SPRINT-Str method is higher than that of the author's method, but its detection rate (sensitivity) for binding residues is much lower than that of the author's method.

Top 7 Contrast Maps

Table 1 Top 7 Contrast Charts

4. Summary

In this paper, the authors innovatively proposed a method to transform protein feature data into a "visual" feature matrix and then input it into a deep CNN framework for training, which achieved good results in the actual prediction of protein binding sites.Of course, at the end of the paper, the author also put forward some ideas about the possible improvement of this method.First, the performance can be improved by changing the order of the multiple sets of eigenvalues.Second, performance can be improved by increasing the size of sliding windows or adopting more complex network structures.These are all aspects that can be considered to improve this method, and the specific effect needs to be proved by experiments.

For any requests of Peptide for research purpose, welcome to contact us. www.gtpeptide.com , sales1@gotopbio.com.

Previous:Peptides and our skin care program

Next:The First And So Far The Only Injectable GLP-1 Oral Drug, Rybelsus, Is Licensed

JTB | CNN Realizes The Visualization Of Protein-peptide Binding Features To Predict Their Binding Si

You may likeMore>>

You may like
More>>